Goto

Collaborating Authors

 uniform sampling


Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Neural Information Processing Systems

Offline reinforcement learning (RL) enables learning a decision-making policy without interaction with the environment. This makes it particularly beneficial in situations where such interactions are costly. However, a known challenge for offline RL algorithms is the distributional mismatch between the state-action distributions of the learned policy and the dataset, which can significantly impact performance. State-of-the-art algorithms address it by constraining the policy to align with the state-action pairs in the dataset. However, this strategy struggles on datasets that predominantly consist of trajectories collected by low-performing policies and only a few trajectories from high-performing ones. Indeed, the constraint to align with the data leads the policy to imitate low-performing behaviors predominating the dataset. Our key insight to address this issue is to constrain the policy to the policy that collected the good parts of the dataset rather than all data. To this end, we optimize the importance sampling weights to emulate sampling data from a data distribution generated by a nearly optimal policy. Our method exhibits considerable performance gains (up to five times better) over the existing approaches in state-of-the-art offline RL algorithms over 72 imbalanced datasets with varying types of imbalance.


Uniform Sampling over Episode Difficulty

Neural Information Processing Systems

Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform an extensive analysis and find that sampling uniformly over episode difficulty outperforms other sampling schemes, including curriculum and easy-/hard-mining. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies across many episodic training algorithms. We demonstrate the efficacy of our method across popular few-shot learning datasets, algorithms, network architectures, and protocols.


Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Neural Information Processing Systems

Offline reinforcement learning (RL) enables learning a decision-making policy without interaction with the environment. This makes it particularly beneficial in situations where such interactions are costly. However, a known challenge for offline RL algorithms is the distributional mismatch between the state-action distributions of the learned policy and the dataset, which can significantly impact performance. State-of-the-art algorithms address it by constraining the policy to align with the state-action pairs in the dataset. However, this strategy struggles on datasets that predominantly consist of trajectories collected by low-performing policies and only a few trajectories from high-performing ones. Indeed, the constraint to align with the data leads the policy to imitate low-performing behaviors predominating the dataset.


Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Neural Information Processing Systems

Offline reinforcement learning (RL) enables learning a decision-making policy without interaction with the environment. This makes it particularly beneficial in situations where such interactions are costly. However, a known challenge for offline RL algorithms is the distributional mismatch between the state-action distributions of the learned policy and the dataset, which can significantly impact performance. State-of-the-art algorithms address it by constraining the policy to align with the state-action pairs in the dataset. However, this strategy struggles on datasets that predominantly consist of trajectories collected by low-performing policies and only a few trajectories from high-performing ones. Indeed, the constraint to align with the data leads the policy to imitate low-performing behaviors predominating the dataset.


Uniform Sampling over Episode Difficulty

Neural Information Processing Systems

Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform an extensive analysis and find that sampling uniformly over episode difficulty outperforms other sampling schemes, including curriculum and easy-/hard-mining. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies across many episodic training algorithms.


Distributional Properties of Subword Regularization

Cognetta, Marco, Zouhar, Vilém, Okazaki, Naoaki

arXiv.org Artificial Intelligence

Subword regularization, used widely in NLP, improves model performance by reducing the dependency on exact tokenizations, augmenting the training corpus, and exposing the model to more unique contexts during training. BPE and MaxMatch, two popular subword tokenization schemes, have stochastic dropout regularization variants. However, there has not been an analysis of the distributions formed by them. We show that these stochastic variants are heavily biased towards a small set of tokenizations per word. If the benefits of subword regularization are as mentioned, we hypothesize that biasedness artificially limits the effectiveness of these schemes. Thus, we propose an algorithm to uniformly sample tokenizations that we use as a drop-in replacement for the stochastic aspects of existing tokenizers, and find that it improves machine translation quality.


Balanced Data Sampling for Language Model Training with Clustering

Shao, Yunfan, Li, Linyang, Fei, Zhaoye, Yan, Hang, Lin, Dahua, Qiu, Xipeng

arXiv.org Artificial Intelligence

Data plays a fundamental role in the training of Large Language Models (LLMs). While attention has been paid to the collection and composition of datasets, determining the data sampling strategy in training remains an open question. Most LLMs are trained with a simple strategy, random sampling. However, this sampling strategy ignores the unbalanced nature of training data distribution, which can be sub-optimal. In this paper, we propose ClusterClip Sampling to balance the text distribution of training data for better model training. Specifically, ClusterClip Sampling utilizes data clustering to reflect the data distribution of the training set and balances the common samples and rare samples during training based on the cluster results. A repetition clip operation is introduced to mitigate the overfitting issue led by samples from certain clusters. Extensive experiments validate the effectiveness of ClusterClip Sampling, which outperforms random sampling and other cluster-based sampling variants under various training datasets and large language models.


Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Hong, Zhang-Wei, Kumar, Aviral, Karnik, Sathwik, Bhandwaldar, Abhishek, Srivastava, Akash, Pajarinen, Joni, Laroche, Romain, Gupta, Abhishek, Agrawal, Pulkit

arXiv.org Artificial Intelligence

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.


Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds

Baykal, Cenk, Liebenwein, Lucas, Gilitschenski, Igor, Feldman, Dan, Rus, Daniela

arXiv.org Machine Learning

The deployment of state-of-the-art neural networks containing millions of parameters to resource-constrained platforms may be prohibitive in terms of both time and space. In this work, we present an efficient coresets-based neural network compression algorithm that provably sparsifies the parameters of a trained feedforward neural network in a manner that approximately preserves the network's output. Our approach is based on an importance sampling scheme that judiciously defines a sampling distribution over the neural network parameters, and as a result, retains parameters of high importance while discarding redundant ones. Our method and analysis introduce an empirical notion of sensitivity and extend traditional coreset constructions to the application of compressing parameters. Our theoretical analysis establishes both instance-dependent and -independent bounds on the size of the resulting compressed neural network as a function of the user-specified tolerance and failure probability parameters. As a corollary to our practical compression algorithm, we obtain novel generalization bounds that may provide novel insights on the generalization properties of neural networks.


Counting and Uniform Sampling from Markov Equivalent DAGs

Ghassami, AmirEmad, Salehkaleybar, Saber, Kiyavash, Negar

arXiv.org Machine Learning

Directed acyclic graphs (DAGs) are the most commonly used graphical model to represent causal relationships among a set of variables. In a DAG representation, a directed edge indicates a direct causal relationship between the corresponding variables. Under Markov property and faithfulness assumptions, conditional d-separation of variables in a DAG is in bijective correspondence with conditional independencies of the variables in the underlying joint probability distribution (Spirtes et al., 2000), and hence, a DAG representation demonstrates conditional independencies among its variables. The general approach for learning a causal structure is to use statistical data from the variables to find a DAG which is the most consistent with the conditional independencies in the given data. However, a DAG representation of a set of conditional independencies is not always unique. 1 This restricts the learning of the causal structure to Markov equivalence classes (MECs), where elements of each class represent the same set of conditional independencies.